Multiple Imputations Using Sequential Semi and Nonparametric Regressions
نویسندگان
چکیده
Multiple imputation is a general purpose method for analyzing data with missing values. Under this approach the missing set of values is replaced by several plausible sets of missing values to yield completed data sets. Each completed data set is then analyzed separately and the results (estimates, standard errors, test statistics etc) are combined to form a single inference. It is fairly well established that the imputations should be draws from a predictive distribution of the missing values and should condition on as many covariates as possible. A sequential regression imputation method uses a Gibbs sampling style iterative process of drawing values from a predictive distribution corresponding to a sequence of conditional regression models to impute the missing values in any given variable with all other variables as predictors. The conditional regression models are usually parametric. In practice, however, many variables have distribution that very difficult to classify or transform to satisfy standard parametric distribution assumptions. We develop and evaluate a modification of this method. We construct propensity score for missing the given variable and the predicted value of that variable. We stratify the sample based on these two scores and then within each stratum, we use approximate Bayesian Bootstrap or Tukey's gh distribution to impute the missing values conditional on the observed values. We illustrate proposed method using actual and simulated data sets.
منابع مشابه
Practice of Epidemiology Multiple Imputation for Missing Data via Sequential Regression Trees
Multiple imputation is particularly well suited to deal with missing data in large epidemiologic studies, because typically these studies support a wide range of analyses by many data users. Some of these analyses may involve complex modeling, including interactions and nonlinear relations. Identifying such relations and encoding them in imputation models, for example, in the conditional regres...
متن کاملMultiple imputation for missing data via sequential regression trees.
Multiple imputation is particularly well suited to deal with missing data in large epidemiologic studies, because typically these studies support a wide range of analyses by many data users. Some of these analyses may involve complex modeling, including interactions and nonlinear relations. Identifying such relations and encoding them in imputation models, for example, in the conditional regres...
متن کاملNonparametric Markov chain bootstrap for multiple imputation
Multiple imputation is a statistical method for analyzing data with missing values. Nonparametric Markov chain bootstrap methods can be used to generate multiple imputations of both scalar and multivariate outcome variables, under the assumption that the data are missing completely at random, and nonparametric inference can be obtained using multiple implementation bootstrap. The nonparametric ...
متن کاملQSAR models to predict physico-chemical Properties of some barbiturate derivatives using molecular descriptors and genetic algorithm- multiple linear regressions
In this study the relationship between choosing appropriate descriptors by genetic algorithm to the Polarizability (POL), Molar Refractivity (MR) and Octanol/water Partition Coefficient (LogP) of barbiturates is studied. The chemical structures of the molecules were optimized using ab initio 6-31G basis set method and Polak-Ribiere algorithm with conjugated gradient within HyperChem 8.0 environ...
متن کامل